facial landmark
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- North America > Canada (0.05)
- Europe > United Kingdom > England > Greater London > London (0.05)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- North America > Canada (0.05)
- Europe > United Kingdom > England > Greater London > London (0.05)
LiteFat: Lightweight Spatio-Temporal Graph Learning for Real-Time Driver Fatigue Detection
Ren, Jing, Ma, Suyu, Jia, Hong, Xu, Xiwei, Lee, Ivan, Fayek, Haytham, Li, Xiaodong, Xia, Feng
-- Detecting driver fatigue is critical for road safety, as drowsy driving remains a leading cause of traffic accidents. Many existing solutions rely on computationally demanding deep learning models, which result in high latency and are unsuitable for embedded robotic devices with limited resources (such as intelligent vehicles/cars) where rapid detection is necessary to prevent accidents. This paper introduces LiteFat, a lightweight spatio-temporal graph learning model designed to detect driver fatigue efficiently while maintaining high accuracy and low computational demands. LiteFat involves converting streaming video data into spatio-temporal graphs (STG) using facial landmark detection, which focuses on key motion patterns and reduces unnecessary data processing. LiteFat uses MobileNet to extract facial features and create a feature matrix for the STG. A lightweight spatio-temporal graph neural network is then employed to identify signs of fatigue with minimal processing and low latency. Experimental results on benchmark datasets show that LiteFat performs competitively while significantly reduced computational complexity and latency as compared to current state-of-the-art methods. This work advances the development of real-time, resource-efficient human fatigue detection systems that can be implemented upon embedded robotic devices. Driver fatigue is a major contributor to traffic accidents worldwide, posing a significant threat to road safety [1].
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Oceania > Australia > South Australia (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Health & Medicine (1.00)
- Transportation > Ground > Road (0.68)
- Information Technology (0.66)
Is It Really You? Exploring Biometric Verification Scenarios in Photorealistic Talking-Head Avatar Videos
Pedrouzo-Rodriguez, Laura, Delgado-DeRobles, Pedro, Gomez, Luis F., Tolosana, Ruben, Vera-Rodriguez, Ruben, Morales, Aythami, Fierrez, Julian
Photorealistic talking-head avatars are becoming increasingly common in virtual meetings, gaming, and social platforms. These avatars allow for more immersive communication, but they also introduce serious security risks. One emerging threat is impersonation: an attacker can steal a user's avatar, preserving his appearance and voice, making it nearly impossible to detect its fraudulent usage by sight or sound alone. In this paper, we explore the challenge of biometric verification in such avatar-mediated scenarios. Our main question is whether an individual's facial motion patterns can serve as reliable behavioral biometrics to verify their identity when the avatar's visual appearance is a facsimile of its owner. To answer this question, we introduce a new dataset of realistic avatar videos created using a state-of-the-art one-shot avatar generation model, GAGAvatar, with genuine and impostor avatar videos. We also propose a lightweight, explainable spatio-temporal Graph Convolutional Network architecture with temporal attention pooling, that uses only facial landmarks to model dynamic facial gestures. Experimental results demonstrate that facial motion cues enable meaningful identity verification with AUC values approaching 80%. The proposed benchmark and biometric system are available for the research community in order to bring attention to the urgent need for more advanced behavioral biometric defenses in avatar-based communication systems.
- Information Technology > Security & Privacy (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Deepfake Detection Via Facial Feature Extraction and Modeling
Carter, Benjamin, Dilla, Nathan, Callahan, Micheal, Ambala, Atuhaire
The rise of deepfake technology brings forth new questions about the authenticity of various forms of media found online today. Videos and images generated by artificial intelligence (AI) have become increasingly more difficult to differentiate from genuine media, resulting in the need for new models to detect artificially-generated media. While many models have attempted to solve this, most focus on direct image processing, adapting a convolutional neural network (CNN) or a recurrent neural network (RNN) that directly interacts with the video image data. This paper introduces an approach of using solely facial landmarks for deepfake detection. Using a dataset consisting of both deepfake and genuine videos of human faces, this paper describes an approach for extracting facial landmarks for deepfake detection, focusing on identifying subtle inconsistencies in facial movements instead of raw image processing. Experimental results demonstrated that this feature extraction technique is effective in various neural network models, with the same facial landmarks tested on three neural network models, with promising performance metrics indicating its potential for real-world applications. The findings discussed in this paper include RNN and artificial neural network (ANN) models with accuracy between 96% and 93%, respectively, with a CNN model hovering around 78%. This research challenges the assumption that raw image processing is necessary to identify deepfake videos by presenting a facial feature extraction approach compatible with various neural network models while requiring fewer parameters.
Evaluation of facial landmark localization performance in a surgical setting
Frajtag, Ines, Švaco, Marko, Šuligoj, Filip
The use of robotics, computer vision, and their applications is becoming increasingly widespread in various fields, including medicine. Many face detection algorithms have found applications in neurosurgery, ophthalmology, and plastic surgery. A common challenge in using these algorithms is variable lighting conditions and the flexibility of detection positions to identify and precisely localize patients. The proposed experiment tests the MediaPipe algorithm for detecting facial landmarks in a controlled setting, using a robotic arm that automatically adjusts positions while the surgical light and the phantom remain in a fixed position. The results of this study demonstrate that the improved accuracy of facial landmark detection under surgical lighting significantly enhances the detection performance at larger yaw and pitch angles. The increase in standard deviation/dispersion occurs due to imprecise detection of selected facial landmarks. This analysis allows for a discussion on the potential integration of the MediaPipe algorithm into medical procedures.
- Europe > Croatia > Zagreb County > Zagreb (0.05)
- Asia > Middle East > Iraq > Nineveh Governorate > Mosul (0.04)
- Asia > India (0.04)
- (2 more...)
- Health & Medicine > Surgery (1.00)
- Health & Medicine > Therapeutic Area > Neurology (0.49)
FreeEnricher: Enriching Face Landmarks without Additional Cost
Huang, Yangyu, Chen, Xi, Kim, Jongyoo, Yang, Hao, Li, Chong, Yang, Jiaolong, Chen, Dong
Recent years have witnessed significant growth of face alignment. Though dense facial landmark is highly demanded in various scenarios, e.g., cosmetic medicine and facial beautification, most works only consider sparse face alignment. To address this problem, we present a framework that can enrich landmark density by existing sparse landmark datasets, e.g., 300W with 68 points and WFLW with 98 points. Firstly, we observe that the local patches along each semantic contour are highly similar in appearance. Then, we propose a weakly-supervised idea of learning the refinement ability on original sparse landmarks and adapting this ability to enriched dense landmarks. Meanwhile, several operators are devised and organized together to implement the idea. Finally, the trained model is applied as a plug-and-play module to the existing face alignment networks. To evaluate our method, we manually label the dense landmarks on 300W testset.
Facial Landmark Visualization and Emotion Recognition Through Neural Networks
Juárez-Jiménez, Israel, Paredes, Tiffany Guadalupe Martínez, García-Ramírez, Jesús, Aguilar, Eric Ramos
Emotion recognition from facial images is a crucial task in human-computer interaction, enabling machines to learn human emotions through facial expressions. Previous studies have shown that facial images can be used to train deep learning models; however, most of these studies do not include a through dataset analysis. Visualizing facial landmarks can be challenging when extracting meaningful dataset insights; to address this issue, we propose facial landmark box plots, a visualization technique designed to identify outliers in facial datasets. Additionally, we compare two sets of facial landmark features: (i) the landmarks' absolute positions and (ii) their displacements from a neutral expression to the peak of an emotional expression. Our results indicate that a neural network achieves better performance than a random forest classifier.
- North America > Mexico > Tlaxcala (0.05)
- Asia > Middle East > Israel (0.04)
From Deception to Perception: The Surprising Benefits of Deepfakes for Detecting, Measuring, and Mitigating Bias
Liu, Yizhi, Padmanabhan, Balaji, Viswanathan, Siva
Individuals from minority groups, even with equivalent qualifications, consistently receive fewer opportunities in critical areas such as employment, education, and healthcare. Yet, empirically demonstrating the existence of such pervasive bias, let alone measuring the extent of bias or correcting it, remains a significant challenge. Over several decades, researchers have utilized a range of experimental methodologies to test for biases in real-life situations (Bertrand and Duflo 2017). Audit studies, among the earliest of such methods, match two individuals who are similar in all respects except for sensitive characteristics like race, to test decision-makers' biases (Ayres and Siegelman 1995). A significant limitation of this method, however, is the inherent impossibility of achieving an exact match between two individuals, precluding perfect comparability (Heckman 1998). Correspondence studies have emerged as a predominant experimental approach for measuring biases (Guryan and Charles 2013, Bertrand and Mullainathan 2004). They create identical fictional profiles with manipulated attributes like race to assess differential treatment. However, these studies traditionally manipulate solely textual information, which may not reflect contemporary decision-making scenarios increasingly influenced by visual cues like facial images, as seen in recent hiring processes (Acquisti and Fong 2020, Ruffle and Shtudiner 2015). This reliance on text limits their effectiveness, as modern contexts often involve multimedia elements, making it challenging to measure real-world biases accurately or correct them based on such incomplete information (Armbruster et al. 2015).
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- Asia > China (0.04)
- Research Report > New Finding (0.94)
- Research Report > Experimental Study (0.69)
- Health & Medicine > Therapeutic Area (0.94)
- Law (0.93)
- Information Technology > Security & Privacy (0.67)
Review for NeurIPS paper: Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition
Additional Feedback: The work is a good incremental step towards understanding the relationship of AU and FER, and their influence in detecting one over the other. Figure 1: I am assuming that the dotted lines represent back-propagation steps for each module. Please clarify this in the manuscript/Figure. Sec 3.1: The explanation on using the generic knowledge as probabilities is not unique ([b]), and the usage of limited 8 AUs (there are a lot more) is not justified. While generating Table 1, it is important to note that these numbers are taken from studies which explored more AUs than mentioned in the table.